Hierarchical Relative Entropy Policy Search
نویسندگان
چکیده
Many reinforcement learning (RL) tasks, especially in robotics, consist of multiple sub-tasks that are strongly structured. Such task structures can be exploited by incorporating hierarchical policies that consist of gating networks and sub-policies. However, this concept has only been partially explored for real world settings and complete methods, derived from first principles, are needed. Real world settings are challenging due to large and continuous state-action spaces that are prohibitive for exhaustive sampling methods. We define the problem of learning sub-policies in continuous state action spaces as finding a hierarchical policy that is composed of a high-level gating policy to select the low-level sub-policies for execution by the agent. In order to efficiently share experience with all sub-policies, also called inter-policy learning, we treat these sub-policies as latent variables which allows for distribution of the update information between the sub-policies. We present three different variants of our algorithm, designed to be suitable for a wide variety of real world robot learning tasks and evaluate our algorithms in two real robot learning scenarios as well as several simulations and comparisons.
منابع مشابه
Online learning in episodic Markovian decision processes by relative entropy policy search
We study the problem of online learning in finite episodic Markov decision processes (MDPs) where the loss function is allowed to change between episodes. The natural performance measure in this learning problem is the regret defined as the difference between the total loss of the best stationary policy and the total loss suffered by the learner. We assume that the learner is given access to a ...
متن کاملLearning to Serve and Bounce a Ball
In this paper we investigate learning the tasks of ball serving and ball bouncing. These tasks display characteristics which are common in a variety of motor skills. To learn the required motor skills for these tasks the robot uses Relative Entropy Policy Search which is a state of the art method in Policy Search Reinforcement Learning. Our experiments show that REPS does not only converge cons...
متن کاملTwenty Questions for Localizing Multiple Objects by Counting: Bayes Optimal Policies for Entropy Loss
We consider the problem of twenty questions with noiseless answers, in which we aim to locate multiple objects by querying the number of objects in each of a sequence of chosen sets. We assume a joint Bayesian prior density on the locations of the objects and seek to choose the sets queried to minimize the expected entropy of the Bayesian posterior distribution after a fixed number of questions...
متن کاملA Hierarchical Approach in Multilevel Thresholding Based on Maximum Entropy and Bayes' Formula
An efficient hierarchical approach for image multi-level thresholding is proposed based on the maximum entropy principle and Bayes’ formula, in which no assumptions of the image histogram are made. Five forms of conditional probability distributions are employed for optimal threshold determination. Our experiments demonstrate that the proposed method is effective and achieves a significant impr...
متن کاملA Relative Entropy Approach to Constructing Hierarchical Summaries
Hierarchies provide a means of organizing, summarizing and accessing information. This paper describes a method for automatically generating hierarchies from small collections of text. A formal framework is presented which uses relative entropy to identify words that are both topical and predictive of the vocabulary used to discuss the topics in the collection. These two features lead to the cr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 17 شماره
صفحات -
تاریخ انتشار 2012